Support Vector Machines for Text Categorization Based on Latent Semantic Indexing

نویسنده

Yan Huang

چکیده

Text Categorization(TC) is an important component in many information organization and information management tasks. Two key issues in TC are feature coding and classifier design. In this paper Text Categorization via Support Vector Machines(SVMs) approach based on Latent Semantic Indexing(LSI) is described. Latent Semantic Indexing[1][2] is a method for selecting informative subspaces of feature spaces with the goal of obtaining a compact representation of document. Support Vector Machines[3] are powerful machine learning systems, which combine remarkable performance with an elegant theoretical framework. The SVMs well fits the Text Categorization task due to the special properties of text itself. Experiments show that the LSI+SVMs frame improves clustering performance by focusing attention of Support Vector Machines onto informative subspaces of the feature spaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Category Structures with a Kernel Function

We propose one type of TOP (Tangent vector Of the Posterior log-odds) kernel and apply it to text categorization. In a number of categorization tasks including text categorization, negative examples are usually more common than positive examples and there may be several different types of negative examples. Therefore, we construct a TOP kernel, regarding the probabilistic model of negative exam...

متن کامل

An Empirical Comparison of Text Categorization Methods

In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Recipro...

متن کامل

Construction of supervised and unsupervised learning systems for multilingual text categorization

Due to the availability of a huge amount of textual data from a variety of sources, users of internationally distributed information regions need effective methods and tools that enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing o...

متن کامل

Supervised Locality Preserving Indexing for Text Categorization

A major characteristic of text categorization problems is the prohibitive high dimensionality of the feature space. Most discrimination methods can not work in such a condition, Latent Semantic Indexing (LSI) has been adopted to solve this problem. However, LSI is not an optimal representation for text categorization task mainly because of two reasons: first, the discriminative categorical info...

متن کامل

Support Vector Machines Based on a Semantic Kernel for Text Categorization

We propose to solve a text categorization task using a new metric between documents, based on a priori semantic knowledge about words. This metric can be incorporated into the definition of radial basis kernels of Support Vector Machines or directly used in a K-nearest neighbors algorithm. Both SVM and KNN are tested and compared on the 20 newsgroups database. Support Vector Machines provide th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Support Vector Machines for Text Categorization Based on Latent Semantic Indexing

نویسنده

چکیده

منابع مشابه

Modeling Category Structures with a Kernel Function

An Empirical Comparison of Text Categorization Methods

Construction of supervised and unsupervised learning systems for multilingual text categorization

Supervised Locality Preserving Indexing for Text Categorization

Support Vector Machines Based on a Semantic Kernel for Text Categorization

عنوان ژورنال:

اشتراک گذاری